Overview

Dataset statistics

Number of variables29
Number of observations2018245
Missing cells17640061
Missing cells (%)30.1%
Total size in memory446.5 MiB
Average record size in memory232.0 B

Variable types

DateTime1
Text16
Unsupported1
Numeric11

Alerts

BOROUGH has 627854 (31.1%) missing valuesMissing
ZIP CODE has 628092 (31.1%) missing valuesMissing
LATITUDE has 229685 (11.4%) missing valuesMissing
LONGITUDE has 229685 (11.4%) missing valuesMissing
LOCATION has 229685 (11.4%) missing valuesMissing
ON STREET NAME has 424807 (21.0%) missing valuesMissing
CROSS STREET NAME has 755532 (37.4%) missing valuesMissing
OFF STREET NAME has 1685810 (83.5%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 2 has 307909 (15.3%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 3 has 1875114 (92.9%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 4 has 1986122 (98.4%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 5 has 2009575 (99.6%) missing valuesMissing
VEHICLE TYPE CODE 2 has 376990 (18.7%) missing valuesMissing
VEHICLE TYPE CODE 3 has 1880098 (93.2%) missing valuesMissing
VEHICLE TYPE CODE 4 has 1987193 (98.5%) missing valuesMissing
VEHICLE TYPE CODE 5 has 2009835 (99.6%) missing valuesMissing
VEHICLE COMBINATION has 377001 (18.7%) missing valuesMissing
LATITUDE is highly skewed (γ1 = -20.42797789)Skewed
NUMBER OF PERSONS KILLED is highly skewed (γ1 = 34.05808743)Skewed
NUMBER OF PEDESTRIANS KILLED is highly skewed (γ1 = 41.90421138)Skewed
NUMBER OF CYCLIST KILLED is highly skewed (γ1 = 95.71982564)Skewed
NUMBER OF MOTORIST KILLED is highly skewed (γ1 = 54.57753588)Skewed
COLLISION_ID has unique valuesUnique
ZIP CODE is an unsupported type, check if it needs cleaning or further analysisUnsupported
NUMBER OF PERSONS INJURED has 1568357 (77.7%) zerosZeros
NUMBER OF PERSONS KILLED has 2015410 (99.9%) zerosZeros
NUMBER OF PEDESTRIANS INJURED has 1911465 (94.7%) zerosZeros
NUMBER OF PEDESTRIANS KILLED has 2016798 (99.9%) zerosZeros
NUMBER OF CYCLIST INJURED has 1966117 (97.4%) zerosZeros
NUMBER OF CYCLIST KILLED has 2018020 (> 99.9%) zerosZeros
NUMBER OF MOTORIST INJURED has 1730540 (85.7%) zerosZeros
NUMBER OF MOTORIST KILLED has 2017144 (99.9%) zerosZeros

Reproduction

Analysis started2023-10-02 01:00:30.372385
Analysis finished2023-10-02 01:01:15.029112
Duration44.66 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1096801
Distinct (%)54.3%
Missing0
Missing (%)0.0%
Memory size15.4 MiB
Minimum2012-07-01 00:05:00
Maximum2023-08-15 23:59:00
2023-10-01T21:01:15.423309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-01T21:01:16.049162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

BOROUGH
Text

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing627854
Missing (%)31.1%
Memory size15.4 MiB
2023-10-01T21:01:16.617035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length7.456125651
Min length5

Characters and Unicode

Total characters10366930
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowBROOKLYN
3rd rowBRONX
4th rowBROOKLYN
5th rowMANHATTAN
ValueCountFrequency (%)
brooklyn 441026
30.4%
queens 372457
25.7%
manhattan 313266
21.6%
bronx 205345
14.2%
staten 58297
 
4.0%
island 58297
 
4.0%
2023-10-01T21:01:17.639869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 1761954
17.0%
O 1087397
10.5%
A 1056392
10.2%
E 803211
 
7.7%
T 743126
 
7.2%
R 646371
 
6.2%
B 646371
 
6.2%
L 499323
 
4.8%
S 489051
 
4.7%
Y 441026
 
4.3%
Other values (9) 2192708
21.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 10308633
99.4%
Space Separator 58297
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1761954
17.1%
O 1087397
10.5%
A 1056392
10.2%
E 803211
 
7.8%
T 743126
 
7.2%
R 646371
 
6.3%
B 646371
 
6.3%
L 499323
 
4.8%
S 489051
 
4.7%
Y 441026
 
4.3%
Other values (8) 2134411
20.7%
Space Separator
ValueCountFrequency (%)
58297
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10308633
99.4%
Common 58297
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1761954
17.1%
O 1087397
10.5%
A 1056392
10.2%
E 803211
 
7.8%
T 743126
 
7.2%
R 646371
 
6.3%
B 646371
 
6.3%
L 499323
 
4.8%
S 489051
 
4.7%
Y 441026
 
4.3%
Other values (8) 2134411
20.7%
Common
ValueCountFrequency (%)
58297
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10366930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 1761954
17.0%
O 1087397
10.5%
A 1056392
10.2%
E 803211
 
7.7%
T 743126
 
7.2%
R 646371
 
6.2%
B 646371
 
6.2%
L 499323
 
4.8%
S 489051
 
4.7%
Y 441026
 
4.3%
Other values (9) 2192708
21.2%

ZIP CODE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing628092
Missing (%)31.1%
Memory size15.4 MiB

LATITUDE
Real number (ℝ)

MISSING  SKEWED 

Distinct125750
Distinct (%)7.0%
Missing229685
Missing (%)11.4%
Infinite0
Infinite (%)0.0%
Mean40.62776338
Minimum0
Maximum43.344444
Zeros4235
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:18.207808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.596733
Q140.667923
median40.721024
Q340.7695595
95-th percentile40.8620661
Maximum43.344444
Range43.344444
Interquartile range (IQR)0.1016365

Descriptive statistics

Standard deviation1.980900782
Coefficient of variation (CV)0.04875731808
Kurtosis415.9795219
Mean40.62776338
Median Absolute Deviation (MAD)0.0513
Skewness-20.42797789
Sum72665192.48
Variance3.923967909
MonotonicityNot monotonic
2023-10-01T21:01:18.815819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4235
 
0.2%
40.861862 853
 
< 0.1%
40.696033 742
 
< 0.1%
40.8047 691
 
< 0.1%
40.608757 671
 
< 0.1%
40.798256 627
 
< 0.1%
40.759308 613
 
< 0.1%
40.6960346 587
 
< 0.1%
40.675735 533
 
< 0.1%
40.658577 502
 
< 0.1%
Other values (125740) 1778506
88.1%
(Missing) 229685
 
11.4%
ValueCountFrequency (%)
0 4235
0.2%
30.78418 1
 
< 0.1%
34.783634 1
 
< 0.1%
40.4989488 2
 
< 0.1%
40.4991346 1
 
< 0.1%
ValueCountFrequency (%)
43.344444 1
< 0.1%
42.64154 1
< 0.1%
42.318317 1
< 0.1%
42.107204 1
< 0.1%
41.91661 1
< 0.1%

LONGITUDE
Real number (ℝ)

MISSING 

Distinct97829
Distinct (%)5.5%
Missing229685
Missing (%)11.4%
Infinite0
Infinite (%)0.0%
Mean-73.75228388
Minimum-201.35999
Maximum0
Zeros4235
Zeros (%)0.2%
Negative1784325
Negative (%)88.4%
Memory size15.4 MiB
2023-10-01T21:01:19.601167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-201.35999
5-th percentile-74.0349992
Q1-73.9749344
median-73.9273161
Q3-73.86665
95-th percentile-73.7631722
Maximum0
Range201.35999
Interquartile range (IQR)0.1082844

Descriptive statistics

Standard deviation3.727568036
Coefficient of variation (CV)-0.05054173023
Kurtosis441.0923234
Mean-73.75228388
Median Absolute Deviation (MAD)0.0526739
Skewness15.98140474
Sum-131910384.9
Variance13.89476346
MonotonicityNot monotonic
2023-10-01T21:01:20.236092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4235
 
0.2%
-73.89063 738
 
< 0.1%
-73.91282 717
 
< 0.1%
-73.98453 698
 
< 0.1%
-74.038086 672
 
< 0.1%
-73.91243 652
 
< 0.1%
-73.89686 630
 
< 0.1%
-73.9845292 587
 
< 0.1%
-73.882744 560
 
< 0.1%
-73.94476 559
 
< 0.1%
Other values (97819) 1778512
88.1%
(Missing) 229685
 
11.4%
ValueCountFrequency (%)
-201.35999 1
 
< 0.1%
-201.23706 105
< 0.1%
-89.13527 1
 
< 0.1%
-86.76847 1
 
< 0.1%
-79.61955 1
 
< 0.1%
ValueCountFrequency (%)
0 4235
0.2%
-32.768513 16
 
< 0.1%
-47.209625 3
 
< 0.1%
-73.66301 1
 
< 0.1%
-73.70055 2
 
< 0.1%

LOCATION
Text

MISSING 

Distinct274041
Distinct (%)15.3%
Missing229685
Missing (%)11.4%
Memory size15.4 MiB
2023-10-01T21:01:21.128531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length24
Mean length22.81198618
Min length10

Characters and Unicode

Total characters40800606
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique149817 ?
Unique (%)8.4%

Sample

1st row(40.667202, -73.8665)
2nd row(40.683304, -73.917274)
3rd row(40.709183, -73.956825)
4th row(40.86816, -73.83148)
5th row(40.67172, -73.8971)
ValueCountFrequency (%)
0.0 8470
 
0.2%
40.861862 853
 
< 0.1%
40.696033 742
 
< 0.1%
73.89063 738
 
< 0.1%
73.91282 717
 
< 0.1%
73.98453 698
 
< 0.1%
40.8047 691
 
< 0.1%
74.038086 672
 
< 0.1%
40.608757 671
 
< 0.1%
73.91243 652
 
< 0.1%
Other values (223568) 3562216
99.6%
2023-10-01T21:01:22.633144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 4470331
11.0%
4 3869198
 
9.5%
. 3577120
 
8.8%
3 3403654
 
8.3%
0 3307996
 
8.1%
9 2629709
 
6.4%
8 2577886
 
6.3%
6 2544516
 
6.2%
5 2037315
 
5.0%
( 1788560
 
4.4%
Other values (6) 10594321
26.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 28284921
69.3%
Other Punctuation 5365680
 
13.2%
Open Punctuation 1788560
 
4.4%
Space Separator 1788560
 
4.4%
Close Punctuation 1788560
 
4.4%
Dash Punctuation 1784325
 
4.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 4470331
15.8%
4 3869198
13.7%
3 3403654
12.0%
0 3307996
11.7%
9 2629709
9.3%
8 2577886
9.1%
6 2544516
9.0%
5 2037315
7.2%
2 1739489
 
6.1%
1 1704827
 
6.0%
Other Punctuation
ValueCountFrequency (%)
. 3577120
66.7%
, 1788560
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1788560
100.0%
Space Separator
ValueCountFrequency (%)
1788560
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1788560
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1784325
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 40800606
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
7 4470331
11.0%
4 3869198
 
9.5%
. 3577120
 
8.8%
3 3403654
 
8.3%
0 3307996
 
8.1%
9 2629709
 
6.4%
8 2577886
 
6.3%
6 2544516
 
6.2%
5 2037315
 
5.0%
( 1788560
 
4.4%
Other values (6) 10594321
26.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40800606
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 4470331
11.0%
4 3869198
 
9.5%
. 3577120
 
8.8%
3 3403654
 
8.3%
0 3307996
 
8.1%
9 2629709
 
6.4%
8 2577886
 
6.3%
6 2544516
 
6.2%
5 2037315
 
5.0%
( 1788560
 
4.4%
Other values (6) 10594321
26.0%

ON STREET NAME
Text

MISSING 

Distinct17990
Distinct (%)1.1%
Missing424807
Missing (%)21.0%
Memory size15.4 MiB
2023-10-01T21:01:23.508700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length32
Mean length30.02577948
Min length2

Characters and Unicode

Total characters47844218
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6346 ?
Unique (%)0.4%

Sample

1st rowWHITESTONE EXPRESSWAY
2nd rowQUEENSBORO BRIDGE UPPER
3rd rowTHROGS NECK BRIDGE
4th rowSARATOGA AVENUE
5th rowMAJOR DEEGAN EXPRESSWAY RAMP
ValueCountFrequency (%)
avenue 593450
 
16.1%
street 509147
 
13.9%
east 150248
 
4.1%
boulevard 124118
 
3.4%
west 112399
 
3.1%
parkway 71852
 
2.0%
road 66512
 
1.8%
expressway 60732
 
1.7%
island 29161
 
0.8%
queens 26387
 
0.7%
Other values (5367) 1931980
52.6%
2023-10-01T21:01:24.912236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
27508136
57.5%
E 3581653
 
7.5%
A 1899395
 
4.0%
T 1789078
 
3.7%
R 1624994
 
3.4%
N 1389933
 
2.9%
S 1371656
 
2.9%
U 953525
 
2.0%
O 846204
 
1.8%
V 830996
 
1.7%
Other values (65) 6048648
 
12.6%

Most occurring categories

ValueCountFrequency (%)
Space Separator 27508136
57.5%
Uppercase Letter 19062728
39.8%
Decimal Number 1147165
 
2.4%
Lowercase Letter 115400
 
0.2%
Other Punctuation 4436
 
< 0.1%
Open Punctuation 3091
 
< 0.1%
Close Punctuation 3087
 
< 0.1%
Dash Punctuation 173
 
< 0.1%
Math Symbol 1
 
< 0.1%
Control 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 3581653
18.8%
A 1899395
10.0%
T 1789078
9.4%
R 1624994
 
8.5%
N 1389933
 
7.3%
S 1371656
 
7.2%
U 953525
 
5.0%
O 846204
 
4.4%
V 830996
 
4.4%
L 625583
 
3.3%
Other values (16) 4149711
21.8%
Lowercase Letter
ValueCountFrequency (%)
e 15435
13.4%
r 10192
 
8.8%
n 9717
 
8.4%
a 9624
 
8.3%
t 8411
 
7.3%
s 7077
 
6.1%
o 6798
 
5.9%
y 5680
 
4.9%
l 5346
 
4.6%
d 4456
 
3.9%
Other values (16) 32664
28.3%
Decimal Number
ValueCountFrequency (%)
1 260820
22.7%
3 129868
11.3%
2 128207
11.2%
4 108794
9.5%
5 106506
9.3%
6 93244
 
8.1%
8 86291
 
7.5%
7 84717
 
7.4%
9 75661
 
6.6%
0 73057
 
6.4%
Other Punctuation
ValueCountFrequency (%)
. 3275
73.8%
/ 1024
 
23.1%
& 62
 
1.4%
' 37
 
0.8%
# 16
 
0.4%
, 16
 
0.4%
@ 6
 
0.1%
Space Separator
ValueCountFrequency (%)
27508136
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3091
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3087
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 173
100.0%
Math Symbol
ValueCountFrequency (%)
> 1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28666090
59.9%
Latin 19178128
40.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 3581653
18.7%
A 1899395
9.9%
T 1789078
9.3%
R 1624994
 
8.5%
N 1389933
 
7.2%
S 1371656
 
7.2%
U 953525
 
5.0%
O 846204
 
4.4%
V 830996
 
4.3%
L 625583
 
3.3%
Other values (42) 4265111
22.2%
Common
ValueCountFrequency (%)
27508136
96.0%
1 260820
 
0.9%
3 129868
 
0.5%
2 128207
 
0.4%
4 108794
 
0.4%
5 106506
 
0.4%
6 93244
 
0.3%
8 86291
 
0.3%
7 84717
 
0.3%
9 75661
 
0.3%
Other values (13) 83846
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 47844218
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
27508136
57.5%
E 3581653
 
7.5%
A 1899395
 
4.0%
T 1789078
 
3.7%
R 1624994
 
3.4%
N 1389933
 
2.9%
S 1371656
 
2.9%
U 953525
 
2.0%
O 846204
 
1.8%
V 830996
 
1.7%
Other values (65) 6048648
 
12.6%

CROSS STREET NAME
Text

MISSING 

Distinct20039
Distinct (%)1.6%
Missing755532
Missing (%)37.4%
Memory size15.4 MiB
2023-10-01T21:01:25.743392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length32
Mean length22.92086008
Min length1

Characters and Unicode

Total characters28942468
Distinct characters76
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6127 ?
Unique (%)0.5%

Sample

1st row20 AVENUE
2nd rowDECATUR STREET
3rd rowEAST 43 STREET
4th rowEAST GATE PLAZA
5th rowwest 80 street -west 81 street
ValueCountFrequency (%)
avenue 552823
 
19.8%
street 449789
 
16.1%
east 109830
 
3.9%
west 70043
 
2.5%
boulevard 66991
 
2.4%
road 54298
 
1.9%
place 33223
 
1.2%
parkway 25933
 
0.9%
3 18440
 
0.7%
park 17087
 
0.6%
Other values (5466) 1394699
49.9%
2023-10-01T21:01:27.235921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14081661
48.7%
E 2872978
 
9.9%
T 1422548
 
4.9%
A 1387639
 
4.8%
R 1121792
 
3.9%
N 1050486
 
3.6%
S 967745
 
3.3%
U 759721
 
2.6%
V 692944
 
2.4%
O 565260
 
2.0%
Other values (66) 4019694
 
13.9%

Most occurring categories

ValueCountFrequency (%)
Space Separator 14081661
48.7%
Uppercase Letter 13751087
47.5%
Decimal Number 1048182
 
3.6%
Lowercase Letter 61192
 
0.2%
Other Punctuation 307
 
< 0.1%
Dash Punctuation 27
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Control 2
 
< 0.1%
Math Symbol 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 2872978
20.9%
T 1422548
10.3%
A 1387639
10.1%
R 1121792
 
8.2%
N 1050486
 
7.6%
S 967745
 
7.0%
U 759721
 
5.5%
V 692944
 
5.0%
O 565260
 
4.1%
L 427447
 
3.1%
Other values (16) 2482527
18.1%
Lowercase Letter
ValueCountFrequency (%)
e 11430
18.7%
t 6384
10.4%
a 6003
9.8%
r 5041
 
8.2%
n 4340
 
7.1%
s 4008
 
6.5%
o 2923
 
4.8%
v 2850
 
4.7%
u 2502
 
4.1%
l 2194
 
3.6%
Other values (16) 13517
22.1%
Decimal Number
ValueCountFrequency (%)
1 232122
22.1%
2 123492
11.8%
3 115117
11.0%
4 94619
9.0%
5 94388
9.0%
8 83322
 
7.9%
7 83178
 
7.9%
6 82662
 
7.9%
9 71902
 
6.9%
0 67380
 
6.4%
Other Punctuation
ValueCountFrequency (%)
/ 127
41.4%
. 71
23.1%
& 52
16.9%
' 51
16.6%
? 3
 
1.0%
, 3
 
1.0%
Space Separator
ValueCountFrequency (%)
14081661
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Control
ValueCountFrequency (%)
 2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15130189
52.3%
Latin 13812279
47.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 2872978
20.8%
T 1422548
10.3%
A 1387639
10.0%
R 1121792
 
8.1%
N 1050486
 
7.6%
S 967745
 
7.0%
U 759721
 
5.5%
V 692944
 
5.0%
O 565260
 
4.1%
L 427447
 
3.1%
Other values (42) 2543719
18.4%
Common
ValueCountFrequency (%)
14081661
93.1%
1 232122
 
1.5%
2 123492
 
0.8%
3 115117
 
0.8%
4 94619
 
0.6%
5 94388
 
0.6%
8 83322
 
0.6%
7 83178
 
0.5%
6 82662
 
0.5%
9 71902
 
0.5%
Other values (14) 67726
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28942467
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14081661
48.7%
E 2872978
 
9.9%
T 1422548
 
4.9%
A 1387639
 
4.8%
R 1121792
 
3.9%
N 1050486
 
3.6%
S 967745
 
3.3%
U 759721
 
2.6%
V 692944
 
2.4%
O 565260
 
2.0%
Other values (65) 4019693
 
13.9%
Specials
ValueCountFrequency (%)
� 1
100.0%

OFF STREET NAME
Text

MISSING 

Distinct215352
Distinct (%)64.8%
Missing1685810
Missing (%)83.5%
Memory size15.4 MiB
2023-10-01T21:01:28.104671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length40
Mean length36.62444388
Min length8

Characters and Unicode

Total characters12175247
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique168284 ?
Unique (%)50.6%

Sample

1st row1211 LORING AVENUE
2nd row344 BAYCHESTER AVENUE
3rd row2047 PITKIN AVENUE
4th row480 DEAN STREET
5th row878 FLATBUSH AVENUE
ValueCountFrequency (%)
avenue 131712
 
11.9%
street 119606
 
10.8%
east 31610
 
2.9%
west 22819
 
2.1%
boulevard 21244
 
1.9%
road 15677
 
1.4%
lot 7881
 
0.7%
parking 7267
 
0.7%
of 6915
 
0.6%
parkway 6580
 
0.6%
Other values (27326) 736012
66.5%
2023-10-01T21:01:29.632819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6748483
55.4%
E 760214
 
6.2%
T 416075
 
3.4%
A 391170
 
3.2%
R 324575
 
2.7%
N 285899
 
2.3%
S 272712
 
2.2%
1 263953
 
2.2%
U 194023
 
1.6%
O 181774
 
1.5%
Other values (74) 2336369
 
19.2%

Most occurring categories

ValueCountFrequency (%)
Space Separator 6748483
55.4%
Uppercase Letter 3928966
32.3%
Decimal Number 1381320
 
11.3%
Dash Punctuation 78402
 
0.6%
Lowercase Letter 23864
 
0.2%
Other Punctuation 9578
 
0.1%
Open Punctuation 2311
 
< 0.1%
Close Punctuation 2300
 
< 0.1%
Modifier Symbol 17
 
< 0.1%
Connector Punctuation 3
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 760214
19.3%
T 416075
10.6%
A 391170
10.0%
R 324575
8.3%
N 285899
 
7.3%
S 272712
 
6.9%
U 194023
 
4.9%
O 181774
 
4.6%
V 181309
 
4.6%
L 137102
 
3.5%
Other values (16) 784113
20.0%
Lowercase Letter
ValueCountFrequency (%)
e 3971
16.6%
t 2786
11.7%
r 2231
9.3%
a 2094
 
8.8%
n 1570
 
6.6%
s 1551
 
6.5%
o 1262
 
5.3%
v 1026
 
4.3%
d 963
 
4.0%
l 954
 
4.0%
Other values (16) 5456
22.9%
Other Punctuation
ValueCountFrequency (%)
/ 6431
67.1%
& 1740
 
18.2%
. 1001
 
10.5%
@ 145
 
1.5%
, 83
 
0.9%
: 59
 
0.6%
# 54
 
0.6%
' 50
 
0.5%
* 8
 
0.1%
? 3
 
< 0.1%
Other values (2) 4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 263953
19.1%
2 179334
13.0%
0 155779
11.3%
3 140846
10.2%
5 139658
10.1%
4 123406
8.9%
6 100790
 
7.3%
7 98482
 
7.1%
8 92988
 
6.7%
9 86084
 
6.2%
Close Punctuation
ValueCountFrequency (%)
) 2299
> 99.9%
] 1
 
< 0.1%
Control
ValueCountFrequency (%)
1
50.0%
 1
50.0%
Space Separator
ValueCountFrequency (%)
6748483
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 78402
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2311
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 17
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8222417
67.5%
Latin 3952830
32.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 760214
19.2%
T 416075
10.5%
A 391170
9.9%
R 324575
8.2%
N 285899
 
7.2%
S 272712
 
6.9%
U 194023
 
4.9%
O 181774
 
4.6%
V 181309
 
4.6%
L 137102
 
3.5%
Other values (42) 807977
20.4%
Common
ValueCountFrequency (%)
6748483
82.1%
1 263953
 
3.2%
2 179334
 
2.2%
0 155779
 
1.9%
3 140846
 
1.7%
5 139658
 
1.7%
4 123406
 
1.5%
6 100790
 
1.2%
7 98482
 
1.2%
8 92988
 
1.1%
Other values (22) 178698
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12175247
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6748483
55.4%
E 760214
 
6.2%
T 416075
 
3.4%
A 391170
 
3.2%
R 324575
 
2.7%
N 285899
 
2.3%
S 272712
 
2.2%
1 263953
 
2.2%
U 194023
 
1.6%
O 181774
 
1.5%
Other values (74) 2336369
 
19.2%

NUMBER OF PERSONS INJURED
Real number (ℝ)

ZEROS 

Distinct31
Distinct (%)< 0.1%
Missing18
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.3024248511
Minimum0
Maximum43
Zeros1568357
Zeros (%)77.7%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:30.205564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6937633069
Coefficient of variation (CV)2.29400231
Kurtosis52.81943505
Mean0.3024248511
Median Absolute Deviation (MAD)0
Skewness4.322086162
Sum610362
Variance0.481307526
MonotonicityNot monotonic
2023-10-01T21:01:30.791980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 1568357
77.7%
1 349056
 
17.3%
2 65772
 
3.3%
3 21490
 
1.1%
4 8002
 
0.4%
5 3107
 
0.2%
6 1285
 
0.1%
7 552
 
< 0.1%
8 243
 
< 0.1%
9 120
 
< 0.1%
Other values (21) 243
 
< 0.1%
ValueCountFrequency (%)
0 1568357
77.7%
1 349056
 
17.3%
2 65772
 
3.3%
3 21490
 
1.1%
4 8002
 
0.4%
ValueCountFrequency (%)
43 1
< 0.1%
40 1
< 0.1%
34 1
< 0.1%
32 1
< 0.1%
31 1
< 0.1%

NUMBER OF PERSONS KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct7
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.001446328288
Minimum0
Maximum8
Zeros2015410
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:31.251584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.04007201236
Coefficient of variation (CV)27.70602821
Kurtosis1973.851717
Mean0.001446328288
Median Absolute Deviation (MAD)0
Skewness34.05808743
Sum2919
Variance0.001605766174
MonotonicityNot monotonic
2023-10-01T21:01:31.786883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 2015410
99.9%
1 2716
 
0.1%
2 71
 
< 0.1%
3 12
 
< 0.1%
4 3
 
< 0.1%
8 1
 
< 0.1%
5 1
 
< 0.1%
(Missing) 31
 
< 0.1%
ValueCountFrequency (%)
0 2015410
99.9%
1 2716
 
0.1%
2 71
 
< 0.1%
3 12
 
< 0.1%
4 3
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 1
 
< 0.1%
4 3
 
< 0.1%
3 12
 
< 0.1%
2 71
< 0.1%

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ)

ZEROS 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05518507416
Minimum0
Maximum27
Zeros1911465
Zeros (%)94.7%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:32.253244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum27
Range27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2412866552
Coefficient of variation (CV)4.372317314
Kurtosis137.052183
Mean0.05518507416
Median Absolute Deviation (MAD)0
Skewness5.801458899
Sum111377
Variance0.05821925
MonotonicityNot monotonic
2023-10-01T21:01:32.808977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 1911465
94.7%
1 102865
 
5.1%
2 3466
 
0.2%
3 344
 
< 0.1%
4 59
 
< 0.1%
5 25
 
< 0.1%
6 11
 
< 0.1%
7 3
 
< 0.1%
9 2
 
< 0.1%
27 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
ValueCountFrequency (%)
0 1911465
94.7%
1 102865
 
5.1%
2 3466
 
0.2%
3 344
 
< 0.1%
4 59
 
< 0.1%
ValueCountFrequency (%)
27 1
< 0.1%
19 1
< 0.1%
15 1
< 0.1%
13 1
< 0.1%
9 2
< 0.1%

NUMBER OF PEDESTRIANS KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0007253826964
Minimum0
Maximum6
Zeros2016798
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:33.292110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.02741555777
Coefficient of variation (CV)37.79461229
Kurtosis2555.389013
Mean0.0007253826964
Median Absolute Deviation (MAD)0
Skewness41.90421138
Sum1464
Variance0.0007516128078
MonotonicityNot monotonic
2023-10-01T21:01:33.758957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=4)
ValueCountFrequency (%)
0 2016798
99.9%
1 1434
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
0 2016798
99.9%
1 1434
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
6 1
 
< 0.1%
2 12
 
< 0.1%
1 1434
 
0.1%
0 2016798
99.9%

NUMBER OF CYCLIST INJURED
Real number (ℝ)

ZEROS 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02612467763
Minimum0
Maximum4
Zeros1966117
Zeros (%)97.4%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:34.252782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1614266671
Coefficient of variation (CV)6.179087429
Kurtosis38.34362648
Mean0.02612467763
Median Absolute Deviation (MAD)0
Skewness6.17799342
Sum52726
Variance0.02605856886
MonotonicityNot monotonic
2023-10-01T21:01:34.795458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%)
0 1966117
97.4%
1 51553
 
2.6%
2 553
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%
ValueCountFrequency (%)
0 1966117
97.4%
1 51553
 
2.6%
2 553
 
< 0.1%
3 21
 
< 0.1%
4 1
 
< 0.1%
ValueCountFrequency (%)
4 1
 
< 0.1%
3 21
 
< 0.1%
2 553
 
< 0.1%
1 51553
 
2.6%
0 1966117
97.4%

NUMBER OF CYCLIST KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0001119784763
Minimum0
Maximum2
Zeros2018020
Zeros (%)> 99.9%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:35.291731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum2
Range2
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.01062812086
Coefficient of variation (CV)94.91217608
Kurtosis9312.90116
Mean0.0001119784763
Median Absolute Deviation (MAD)0
Skewness95.71982564
Sum226
Variance0.0001129569531
MonotonicityNot monotonic
2023-10-01T21:01:35.805986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=3)
ValueCountFrequency (%)
0 2018020
> 99.9%
1 224
 
< 0.1%
2 1
 
< 0.1%
ValueCountFrequency (%)
0 2018020
> 99.9%
1 224
 
< 0.1%
2 1
 
< 0.1%
ValueCountFrequency (%)
2 1
 
< 0.1%
1 224
 
< 0.1%
0 2018020
> 99.9%

NUMBER OF MOTORIST INJURED
Real number (ℝ)

ZEROS 

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2179888963
Minimum0
Maximum43
Zeros1730540
Zeros (%)85.7%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:36.336818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6549699539
Coefficient of variation (CV)3.004602368
Kurtosis65.55690898
Mean0.2179888963
Median Absolute Deviation (MAD)0
Skewness5.193745228
Sum439955
Variance0.4289856406
MonotonicityNot monotonic
2023-10-01T21:01:36.890108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
0 1730540
85.7%
1 193508
 
9.6%
2 60096
 
3.0%
3 20851
 
1.0%
4 7843
 
0.4%
5 3056
 
0.2%
6 1240
 
0.1%
7 527
 
< 0.1%
8 234
 
< 0.1%
9 116
 
< 0.1%
Other values (20) 234
 
< 0.1%
ValueCountFrequency (%)
0 1730540
85.7%
1 193508
 
9.6%
2 60096
 
3.0%
3 20851
 
1.0%
4 7843
 
0.4%
ValueCountFrequency (%)
43 1
< 0.1%
40 1
< 0.1%
34 1
< 0.1%
31 1
< 0.1%
30 1
< 0.1%

NUMBER OF MOTORIST KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005896211808
Minimum0
Maximum5
Zeros2017144
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:37.363140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.02648116975
Coefficient of variation (CV)44.91217517
Kurtosis4042.602393
Mean0.0005896211808
Median Absolute Deviation (MAD)0
Skewness54.57753588
Sum1190
Variance0.0007012523514
MonotonicityNot monotonic
2023-10-01T21:01:37.863657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 2017144
99.9%
1 1031
 
0.1%
2 55
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
0 2017144
99.9%
1 1031
 
0.1%
2 55
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
ValueCountFrequency (%)
5 1
 
< 0.1%
4 2
 
< 0.1%
3 12
 
< 0.1%
2 55
 
< 0.1%
1 1031
0.1%
Distinct61
Distinct (%)< 0.1%
Missing6348
Missing (%)0.3%
Memory size15.4 MiB
2023-10-01T21:01:38.496961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length43
Mean length19.45329905
Min length1

Characters and Unicode

Total characters39138034
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAggressive Driving/Road Rage
2nd rowPavement Slippery
3rd rowFollowing Too Closely
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 692736
17.3%
driver 432536
 
10.8%
inattention/distraction 401262
 
10.0%
too 157315
 
3.9%
closely 157315
 
3.9%
to 143244
 
3.6%
failure 125196
 
3.1%
yield 119166
 
3.0%
right-of-way 119166
 
3.0%
following 107467
 
2.7%
Other values (96) 1540479
38.6%
2023-10-01T21:01:39.721488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 4413995
 
11.3%
e 3990869
 
10.2%
n 3401289
 
8.7%
t 2707853
 
6.9%
o 2302244
 
5.9%
r 2289239
 
5.8%
s 2040012
 
5.2%
1983985
 
5.1%
a 1925256
 
4.9%
c 1515443
 
3.9%
Other values (45) 12567849
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 31977468
81.7%
Uppercase Letter 4423958
 
11.3%
Space Separator 1983985
 
5.1%
Other Punctuation 508165
 
1.3%
Dash Punctuation 240030
 
0.6%
Open Punctuation 2108
 
< 0.1%
Close Punctuation 2108
 
< 0.1%
Decimal Number 212
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 4413995
13.8%
e 3990869
12.5%
n 3401289
10.6%
t 2707853
8.5%
o 2302244
 
7.2%
r 2289239
 
7.2%
s 2040012
 
6.4%
a 1925256
 
6.0%
c 1515443
 
4.7%
l 1207962
 
3.8%
Other values (15) 6183306
19.3%
Uppercase Letter
ValueCountFrequency (%)
D 976406
22.1%
U 910860
20.6%
I 569111
12.9%
F 287918
 
6.5%
C 276081
 
6.2%
T 246158
 
5.6%
P 178719
 
4.0%
R 162979
 
3.7%
L 129923
 
2.9%
W 120252
 
2.7%
Other values (12) 565551
12.8%
Decimal Number
ValueCountFrequency (%)
8 101
47.6%
0 101
47.6%
1 10
 
4.7%
Space Separator
ValueCountFrequency (%)
1983985
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 508165
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 240030
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2108
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2108
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 36401426
93.0%
Common 2736608
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 4413995
12.1%
e 3990869
 
11.0%
n 3401289
 
9.3%
t 2707853
 
7.4%
o 2302244
 
6.3%
r 2289239
 
6.3%
s 2040012
 
5.6%
a 1925256
 
5.3%
c 1515443
 
4.2%
l 1207962
 
3.3%
Other values (37) 10607264
29.1%
Common
ValueCountFrequency (%)
1983985
72.5%
/ 508165
 
18.6%
- 240030
 
8.8%
( 2108
 
0.1%
) 2108
 
0.1%
8 101
 
< 0.1%
0 101
 
< 0.1%
1 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39138034
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 4413995
 
11.3%
e 3990869
 
10.2%
n 3401289
 
8.7%
t 2707853
 
6.9%
o 2302244
 
5.9%
r 2289239
 
5.8%
s 2040012
 
5.2%
1983985
 
5.1%
a 1925256
 
4.9%
c 1515443
 
3.9%
Other values (45) 12567849
32.1%
Distinct61
Distinct (%)< 0.1%
Missing307909
Missing (%)15.3%
Memory size15.4 MiB
2023-10-01T21:01:40.271394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.0438674
Min length1

Characters and Unicode

Total characters22309396
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 1440015
68.7%
driver 98225
 
4.7%
inattention/distraction 91712
 
4.4%
other 32434
 
1.5%
vehicular 31373
 
1.5%
too 26844
 
1.3%
closely 26844
 
1.3%
to 21040
 
1.0%
passing 20921
 
1.0%
lane 19538
 
0.9%
Other values (96) 288274
 
13.7%
2023-10-01T21:01:41.822651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 3517830
15.8%
e 3423220
15.3%
n 1999138
9.0%
s 1714454
7.7%
c 1625212
7.3%
d 1511422
6.8%
p 1507909
6.8%
f 1494364
6.7%
U 1475432
6.6%
t 603217
 
2.7%
Other values (45) 3437198
15.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 19575504
87.7%
Uppercase Letter 2196560
 
9.8%
Space Separator 386884
 
1.7%
Other Punctuation 115746
 
0.5%
Dash Punctuation 34091
 
0.2%
Open Punctuation 281
 
< 0.1%
Close Punctuation 281
 
< 0.1%
Decimal Number 49
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 3517830
18.0%
e 3423220
17.5%
n 1999138
10.2%
s 1714454
8.8%
c 1625212
8.3%
d 1511422
7.7%
p 1507909
7.7%
f 1494364
7.6%
t 603217
 
3.1%
r 526160
 
2.7%
Other values (15) 1652578
8.4%
Uppercase Letter
ValueCountFrequency (%)
U 1475432
67.2%
D 218605
 
10.0%
I 123089
 
5.6%
C 51127
 
2.3%
F 47323
 
2.2%
T 43230
 
2.0%
O 43189
 
2.0%
V 40393
 
1.8%
P 36364
 
1.7%
L 27874
 
1.3%
Other values (12) 89934
 
4.1%
Decimal Number
ValueCountFrequency (%)
8 22
44.9%
0 22
44.9%
1 5
 
10.2%
Space Separator
ValueCountFrequency (%)
386884
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 115746
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34091
100.0%
Open Punctuation
ValueCountFrequency (%)
( 281
100.0%
Close Punctuation
ValueCountFrequency (%)
) 281
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 21772064
97.6%
Common 537332
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 3517830
16.2%
e 3423220
15.7%
n 1999138
9.2%
s 1714454
7.9%
c 1625212
7.5%
d 1511422
6.9%
p 1507909
6.9%
f 1494364
6.9%
U 1475432
6.8%
t 603217
 
2.8%
Other values (37) 2899866
13.3%
Common
ValueCountFrequency (%)
386884
72.0%
/ 115746
 
21.5%
- 34091
 
6.3%
( 281
 
0.1%
) 281
 
0.1%
8 22
 
< 0.1%
0 22
 
< 0.1%
1 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22309396
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 3517830
15.8%
e 3423220
15.3%
n 1999138
9.0%
s 1714454
7.7%
c 1625212
7.3%
d 1511422
6.8%
p 1507909
6.8%
f 1494364
6.7%
U 1475432
6.6%
t 603217
 
2.7%
Other values (45) 3437198
15.4%
Distinct51
Distinct (%)< 0.1%
Missing1875114
Missing (%)92.9%
Memory size15.4 MiB
2023-10-01T21:01:42.573149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length11
Mean length11.65463107
Min length1

Characters and Unicode

Total characters1668139
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 133444
85.9%
other 2706
 
1.7%
vehicular 2666
 
1.7%
driver 2060
 
1.3%
too 1909
 
1.2%
closely 1909
 
1.2%
inattention/distraction 1885
 
1.2%
following 1859
 
1.2%
fatigued/drowsy 853
 
0.5%
pavement 394
 
0.3%
Other values (79) 5691
 
3.7%
2023-10-01T21:01:43.985993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 285087
17.1%
i 283874
17.0%
n 146282
8.8%
s 140137
8.4%
c 139607
8.4%
d 135503
8.1%
p 135033
8.1%
f 134321
8.1%
U 134069
8.0%
o 16543
 
1.0%
Other values (45) 117683
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1494483
89.6%
Uppercase Letter 158061
 
9.5%
Space Separator 12245
 
0.7%
Other Punctuation 3016
 
0.2%
Dash Punctuation 303
 
< 0.1%
Open Punctuation 12
 
< 0.1%
Close Punctuation 12
 
< 0.1%
Decimal Number 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 285087
19.1%
i 283874
19.0%
n 146282
9.8%
s 140137
9.4%
c 139607
9.3%
d 135503
9.1%
p 135033
9.0%
f 134321
9.0%
o 16543
 
1.1%
t 15513
 
1.0%
Other values (15) 62583
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
U 134069
84.8%
D 5387
 
3.4%
O 3026
 
1.9%
F 2952
 
1.9%
V 2940
 
1.9%
I 2387
 
1.5%
C 2381
 
1.5%
T 2163
 
1.4%
P 670
 
0.4%
S 532
 
0.3%
Other values (12) 1554
 
1.0%
Decimal Number
ValueCountFrequency (%)
8 3
42.9%
0 3
42.9%
1 1
 
14.3%
Space Separator
ValueCountFrequency (%)
12245
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3016
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 303
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1652544
99.1%
Common 15595
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 285087
17.3%
i 283874
17.2%
n 146282
8.9%
s 140137
8.5%
c 139607
8.4%
d 135503
8.2%
p 135033
8.2%
f 134321
8.1%
U 134069
8.1%
o 16543
 
1.0%
Other values (37) 102088
 
6.2%
Common
ValueCountFrequency (%)
12245
78.5%
/ 3016
 
19.3%
- 303
 
1.9%
( 12
 
0.1%
) 12
 
0.1%
8 3
 
< 0.1%
0 3
 
< 0.1%
1 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1668139
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 285087
17.1%
i 283874
17.0%
n 146282
8.8%
s 140137
8.4%
c 139607
8.4%
d 135503
8.1%
p 135033
8.1%
f 134321
8.1%
U 134069
8.0%
o 16543
 
1.0%
Other values (45) 117683
7.1%
Distinct41
Distinct (%)0.1%
Missing1986122
Missing (%)98.4%
Memory size15.4 MiB
2023-10-01T21:01:44.600413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length11
Mean length11.48706534
Min length5

Characters and Unicode

Total characters368999
Distinct characters51
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 30317
88.3%
other 584
 
1.7%
vehicular 575
 
1.7%
too 374
 
1.1%
closely 374
 
1.1%
following 369
 
1.1%
driver 293
 
0.9%
inattention/distraction 266
 
0.8%
fatigued/drowsy 170
 
0.5%
pavement 113
 
0.3%
Other values (64) 911
 
2.7%
2023-10-01T21:01:45.743084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 64025
17.4%
i 63468
17.2%
n 32294
8.8%
c 31426
8.5%
s 31411
8.5%
d 30663
8.3%
p 30657
8.3%
f 30435
8.2%
U 30412
8.2%
o 2938
 
0.8%
Other values (41) 21270
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 331505
89.8%
Uppercase Letter 34758
 
9.4%
Space Separator 2223
 
0.6%
Other Punctuation 471
 
0.1%
Dash Punctuation 34
 
< 0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 64025
19.3%
i 63468
19.1%
n 32294
9.7%
c 31426
9.5%
s 31411
9.5%
d 30663
9.2%
p 30657
9.2%
f 30435
9.2%
o 2938
 
0.9%
r 2638
 
0.8%
Other values (15) 11550
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 30412
87.5%
D 837
 
2.4%
O 637
 
1.8%
V 621
 
1.8%
F 583
 
1.7%
C 434
 
1.2%
T 403
 
1.2%
I 336
 
1.0%
S 139
 
0.4%
P 136
 
0.4%
Other values (11) 220
 
0.6%
Space Separator
ValueCountFrequency (%)
2223
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 471
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 366263
99.3%
Common 2736
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 64025
17.5%
i 63468
17.3%
n 32294
8.8%
c 31426
8.6%
s 31411
8.6%
d 30663
8.4%
p 30657
8.4%
f 30435
8.3%
U 30412
8.3%
o 2938
 
0.8%
Other values (36) 18534
 
5.1%
Common
ValueCountFrequency (%)
2223
81.2%
/ 471
 
17.2%
- 34
 
1.2%
( 4
 
0.1%
) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 368999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 64025
17.4%
i 63468
17.2%
n 32294
8.8%
c 31426
8.5%
s 31411
8.5%
d 30663
8.3%
p 30657
8.3%
f 30435
8.2%
U 30412
8.2%
o 2938
 
0.8%
Other values (41) 21270
 
5.8%
Distinct30
Distinct (%)0.3%
Missing2009575
Missing (%)99.6%
Memory size15.4 MiB
2023-10-01T21:01:46.212862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length11
Mean length11.46758939
Min length5

Characters and Unicode

Total characters99424
Distinct characters50
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 8177
88.3%
other 168
 
1.8%
vehicular 166
 
1.8%
too 91
 
1.0%
closely 91
 
1.0%
following 89
 
1.0%
driver 73
 
0.8%
inattention/distraction 63
 
0.7%
pavement 48
 
0.5%
slippery 47
 
0.5%
Other values (47) 247
 
2.7%
2023-10-01T21:01:47.278421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 17320
17.4%
i 17093
17.2%
n 8685
8.7%
c 8481
8.5%
s 8434
8.5%
p 8299
8.3%
d 8262
8.3%
f 8204
8.3%
U 8200
8.2%
o 729
 
0.7%
Other values (40) 5717
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 89345
89.9%
Uppercase Letter 9359
 
9.4%
Space Separator 590
 
0.6%
Other Punctuation 115
 
0.1%
Dash Punctuation 11
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 17320
19.4%
i 17093
19.1%
n 8685
9.7%
c 8481
9.5%
s 8434
9.4%
p 8299
9.3%
d 8262
9.2%
f 8204
9.2%
o 729
 
0.8%
r 715
 
0.8%
Other values (15) 3123
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 8200
87.6%
D 204
 
2.2%
O 184
 
2.0%
V 179
 
1.9%
F 142
 
1.5%
C 103
 
1.1%
T 97
 
1.0%
I 87
 
0.9%
S 57
 
0.6%
P 51
 
0.5%
Other values (10) 55
 
0.6%
Space Separator
ValueCountFrequency (%)
590
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 115
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 98704
99.3%
Common 720
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 17320
17.5%
i 17093
17.3%
n 8685
8.8%
c 8481
8.6%
s 8434
8.5%
p 8299
8.4%
d 8262
8.4%
f 8204
8.3%
U 8200
8.3%
o 729
 
0.7%
Other values (35) 4997
 
5.1%
Common
ValueCountFrequency (%)
590
81.9%
/ 115
 
16.0%
- 11
 
1.5%
( 2
 
0.3%
) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 99424
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 17320
17.4%
i 17093
17.2%
n 8685
8.7%
c 8481
8.5%
s 8434
8.5%
p 8299
8.3%
d 8262
8.3%
f 8204
8.3%
U 8200
8.2%
o 729
 
0.7%
Other values (40) 5717
 
5.8%

COLLISION_ID
Real number (ℝ)

UNIQUE 

Distinct2018245
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3116454.661
Minimum22
Maximum4655026
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.4 MiB
2023-10-01T21:01:47.815446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile101751.2
Q13140681
median3645346
Q34150156
95-th percentile4553890.8
Maximum4655026
Range4655004
Interquartile range (IQR)1009475

Descriptive statistics

Standard deviation1503996.846
Coefficient of variation (CV)0.4825986609
Kurtosis-0.1124875456
Mean3116454.661
Median Absolute Deviation (MAD)504738
Skewness-1.204533211
Sum6.289769037 × 1012
Variance2.262006513 × 1012
MonotonicityNot monotonic
2023-10-01T21:01:48.401464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4455765 1
 
< 0.1%
3215029 1
 
< 0.1%
3210593 1
 
< 0.1%
3210501 1
 
< 0.1%
3218613 1
 
< 0.1%
3212622 1
 
< 0.1%
3224369 1
 
< 0.1%
3213701 1
 
< 0.1%
3228991 1
 
< 0.1%
3224246 1
 
< 0.1%
Other values (2018235) 2018235
> 99.9%
ValueCountFrequency (%)
22 1
< 0.1%
23 1
< 0.1%
24 1
< 0.1%
25 1
< 0.1%
26 1
< 0.1%
ValueCountFrequency (%)
4655026 1
< 0.1%
4655023 1
< 0.1%
4655021 1
< 0.1%
4655019 1
< 0.1%
4655016 1
< 0.1%
Distinct1562
Distinct (%)0.1%
Missing12677
Missing (%)0.6%
Memory size15.4 MiB
2023-10-01T21:01:48.885431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length35
Mean length16.90862938
Min length1

Characters and Unicode

Total characters33911406
Distinct characters75
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique944 ?
Unique (%)< 0.1%

Sample

1st rowSedan
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowDump
ValueCountFrequency (%)
vehicle 860450
18.1%
utility 613996
12.9%
station 613956
12.9%
sedan 593505
12.5%
wagon/sport 433665
9.1%
passenger 416217
8.7%
181583
 
3.8%
wagon 180349
 
3.8%
sport 180291
 
3.8%
truck 83089
 
1.7%
Other values (918) 604471
12.7%
2023-10-01T21:01:50.024853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2769224
 
8.2%
S 2669549
 
7.9%
t 2199918
 
6.5%
i 1853656
 
5.5%
E 1817948
 
5.4%
a 1551343
 
4.6%
e 1540361
 
4.5%
n 1481788
 
4.4%
o 1372162
 
4.0%
T 1136684
 
3.4%
Other values (65) 15518773
45.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 15401306
45.4%
Lowercase Letter 14949349
44.1%
Space Separator 2769224
 
8.2%
Other Punctuation 615298
 
1.8%
Decimal Number 70965
 
0.2%
Dash Punctuation 50031
 
0.1%
Open Punctuation 27616
 
0.1%
Close Punctuation 27613
 
0.1%
Modifier Symbol 2
 
< 0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2669549
17.3%
E 1817948
11.8%
T 1136684
 
7.4%
I 1051994
 
6.8%
V 933305
 
6.1%
A 874896
 
5.7%
N 865293
 
5.6%
R 723411
 
4.7%
U 675992
 
4.4%
L 667524
 
4.3%
Other values (16) 3984710
25.9%
Lowercase Letter
ValueCountFrequency (%)
t 2199918
14.7%
i 1853656
12.4%
a 1551343
10.4%
e 1540361
10.3%
n 1481788
9.9%
o 1372162
9.2%
l 902170
6.0%
d 641319
 
4.3%
r 600544
 
4.0%
c 574275
 
3.8%
Other values (15) 2231813
14.9%
Decimal Number
ValueCountFrequency (%)
4 53397
75.2%
6 14403
 
20.3%
2 2675
 
3.8%
3 321
 
0.5%
1 55
 
0.1%
5 42
 
0.1%
0 36
 
0.1%
9 20
 
< 0.1%
8 9
 
< 0.1%
7 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 615272
> 99.9%
. 13
 
< 0.1%
# 6
 
< 0.1%
, 3
 
< 0.1%
' 2
 
< 0.1%
? 1
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2769224
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 50031
100.0%
Open Punctuation
ValueCountFrequency (%)
( 27616
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27613
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 30350655
89.5%
Common 3560751
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2669549
 
8.8%
t 2199918
 
7.2%
i 1853656
 
6.1%
E 1817948
 
6.0%
a 1551343
 
5.1%
e 1540361
 
5.1%
n 1481788
 
4.9%
o 1372162
 
4.5%
T 1136684
 
3.7%
I 1051994
 
3.5%
Other values (41) 13675252
45.1%
Common
ValueCountFrequency (%)
2769224
77.8%
/ 615272
 
17.3%
4 53397
 
1.5%
- 50031
 
1.4%
( 27616
 
0.8%
) 27613
 
0.8%
6 14403
 
0.4%
2 2675
 
0.1%
3 321
 
< 0.1%
1 55
 
< 0.1%
Other values (14) 144
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33911405
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2769224
 
8.2%
S 2669549
 
7.9%
t 2199918
 
6.5%
i 1853656
 
5.5%
E 1817948
 
5.4%
a 1551343
 
4.6%
e 1540361
 
4.5%
n 1481788
 
4.4%
o 1372162
 
4.0%
T 1136684
 
3.4%
Other values (64) 15518772
45.8%
Specials
ValueCountFrequency (%)
� 1
100.0%

VEHICLE TYPE CODE 2
Text

MISSING 

Distinct1739
Distinct (%)0.1%
Missing376990
Missing (%)18.7%
Memory size15.4 MiB
2023-10-01T21:01:50.916082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length38
Median length30
Mean length16.11302509
Min length1

Characters and Unicode

Total characters26445583
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1034 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowPick-up Truck
3rd rowSedan
4th rowTractor Truck Diesel
5th rowSedan
ValueCountFrequency (%)
vehicle 642409
17.1%
utility 455446
12.1%
station 455422
12.1%
sedan 420741
11.2%
passenger 318610
8.5%
wagon/sport 315218
8.4%
141437
 
3.8%
wagon 140256
 
3.7%
sport 140204
 
3.7%
truck 82388
 
2.2%
Other values (971) 643153
17.1%
2023-10-01T21:01:52.443334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2126995
 
8.0%
S 1993078
 
7.5%
t 1607094
 
6.1%
E 1437058
 
5.4%
i 1380800
 
5.2%
e 1145297
 
4.3%
a 1125945
 
4.3%
n 1069195
 
4.0%
o 1020212
 
3.9%
T 915148
 
3.5%
Other values (62) 12624761
47.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 12576139
47.6%
Lowercase Letter 11123251
42.1%
Space Separator 2126995
 
8.0%
Other Punctuation 456724
 
1.7%
Decimal Number 59135
 
0.2%
Dash Punctuation 50036
 
0.2%
Open Punctuation 26652
 
0.1%
Close Punctuation 26649
 
0.1%
Modifier Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 1993078
15.8%
E 1437058
11.4%
T 915148
 
7.3%
N 869229
 
6.9%
I 842049
 
6.7%
V 709350
 
5.6%
A 685071
 
5.4%
O 587828
 
4.7%
R 577759
 
4.6%
U 573837
 
4.6%
Other values (16) 3385732
26.9%
Lowercase Letter
ValueCountFrequency (%)
t 1607094
14.4%
i 1380800
12.4%
e 1145297
10.3%
a 1125945
10.1%
n 1069195
9.6%
o 1020212
9.2%
l 661121
 
5.9%
r 470682
 
4.2%
d 458514
 
4.1%
c 449963
 
4.0%
Other values (15) 1734428
15.6%
Decimal Number
ValueCountFrequency (%)
4 43057
72.8%
6 13694
 
23.2%
2 1958
 
3.3%
3 285
 
0.5%
0 53
 
0.1%
1 41
 
0.1%
5 27
 
< 0.1%
9 8
 
< 0.1%
8 7
 
< 0.1%
7 5
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 456704
> 99.9%
. 11
 
< 0.1%
' 3
 
< 0.1%
, 2
 
< 0.1%
# 2
 
< 0.1%
? 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2126995
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 50036
100.0%
Open Punctuation
ValueCountFrequency (%)
( 26652
100.0%
Close Punctuation
ValueCountFrequency (%)
) 26649
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23699390
89.6%
Common 2746193
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 1993078
 
8.4%
t 1607094
 
6.8%
E 1437058
 
6.1%
i 1380800
 
5.8%
e 1145297
 
4.8%
a 1125945
 
4.8%
n 1069195
 
4.5%
o 1020212
 
4.3%
T 915148
 
3.9%
N 869229
 
3.7%
Other values (41) 11136334
47.0%
Common
ValueCountFrequency (%)
2126995
77.5%
/ 456704
 
16.6%
- 50036
 
1.8%
4 43057
 
1.6%
( 26652
 
1.0%
) 26649
 
1.0%
6 13694
 
0.5%
2 1958
 
0.1%
3 285
 
< 0.1%
0 53
 
< 0.1%
Other values (11) 110
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26445583
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2126995
 
8.0%
S 1993078
 
7.5%
t 1607094
 
6.1%
E 1437058
 
5.4%
i 1380800
 
5.2%
e 1145297
 
4.3%
a 1125945
 
4.3%
n 1069195
 
4.0%
o 1020212
 
3.9%
T 915148
 
3.5%
Other values (62) 12624761
47.7%

VEHICLE TYPE CODE 3
Text

MISSING 

Distinct246
Distinct (%)0.2%
Missing1880098
Missing (%)93.2%
Memory size15.4 MiB
2023-10-01T21:01:53.099050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.68346037
Min length2

Characters and Unicode

Total characters2442917
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique142 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 62326
18.5%
utility 47537
14.1%
station 47535
14.1%
sedan 44904
13.4%
wagon/sport 34176
10.2%
passenger 27716
8.2%
13436
 
4.0%
sport 13358
 
4.0%
wagon 13358
 
4.0%
truck 4094
 
1.2%
Other values (201) 27838
8.3%
2023-10-01T21:01:54.328153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
198566
 
8.1%
S 194466
 
8.0%
t 172204
 
7.0%
i 142271
 
5.8%
a 116663
 
4.8%
E 116377
 
4.8%
e 116165
 
4.8%
n 114066
 
4.7%
o 105313
 
4.3%
T 76669
 
3.1%
Other values (52) 1090157
44.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1127807
46.2%
Uppercase Letter 1060635
43.4%
Space Separator 198566
 
8.1%
Other Punctuation 47613
 
1.9%
Decimal Number 3640
 
0.1%
Dash Punctuation 2904
 
0.1%
Open Punctuation 876
 
< 0.1%
Close Punctuation 876
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 194466
18.3%
E 116377
11.0%
T 76669
 
7.2%
I 71395
 
6.7%
N 65711
 
6.2%
V 65393
 
6.2%
A 57912
 
5.5%
U 52569
 
5.0%
W 50920
 
4.8%
O 46577
 
4.4%
Other values (15) 262646
24.8%
Lowercase Letter
ValueCountFrequency (%)
t 172204
15.3%
i 142271
12.6%
a 116663
10.3%
e 116165
10.3%
n 114066
10.1%
o 105313
9.3%
l 69668
6.2%
d 47840
 
4.2%
r 42374
 
3.8%
c 41169
 
3.7%
Other values (14) 160074
14.2%
Decimal Number
ValueCountFrequency (%)
4 2998
82.4%
6 442
 
12.1%
2 185
 
5.1%
3 10
 
0.3%
8 2
 
0.1%
1 1
 
< 0.1%
0 1
 
< 0.1%
5 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
198566
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 47613
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2904
100.0%
Open Punctuation
ValueCountFrequency (%)
( 876
100.0%
Close Punctuation
ValueCountFrequency (%)
) 876
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2188442
89.6%
Common 254475
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 194466
 
8.9%
t 172204
 
7.9%
i 142271
 
6.5%
a 116663
 
5.3%
E 116377
 
5.3%
e 116165
 
5.3%
n 114066
 
5.2%
o 105313
 
4.8%
T 76669
 
3.5%
I 71395
 
3.3%
Other values (39) 962853
44.0%
Common
ValueCountFrequency (%)
198566
78.0%
/ 47613
 
18.7%
4 2998
 
1.2%
- 2904
 
1.1%
( 876
 
0.3%
) 876
 
0.3%
6 442
 
0.2%
2 185
 
0.1%
3 10
 
< 0.1%
8 2
 
< 0.1%
Other values (3) 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2442917
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
198566
 
8.1%
S 194466
 
8.0%
t 172204
 
7.0%
i 142271
 
5.8%
a 116663
 
4.8%
E 116377
 
4.8%
e 116165
 
4.8%
n 114066
 
4.7%
o 105313
 
4.3%
T 76669
 
3.1%
Other values (52) 1090157
44.6%

VEHICLE TYPE CODE 4
Text

MISSING 

Distinct99
Distinct (%)0.3%
Missing1987193
Missing (%)98.5%
Memory size15.4 MiB
2023-10-01T21:01:54.839694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.95169393
Min length2

Characters and Unicode

Total characters557436
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43 ?
Unique (%)0.1%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 14336
18.9%
utility 11162
14.7%
station 11162
14.7%
sedan 10798
14.2%
wagon/sport 8310
10.9%
passenger 5969
7.9%
2859
 
3.8%
sport 2852
 
3.8%
wagon 2852
 
3.8%
truck 744
 
1.0%
Other values (101) 4929
 
6.5%
2023-10-01T21:01:55.942854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
44977
 
8.1%
S 44695
 
8.0%
t 41751
 
7.5%
i 34281
 
6.1%
a 28042
 
5.0%
e 27840
 
5.0%
n 27549
 
4.9%
o 25366
 
4.6%
E 24666
 
4.4%
l 16836
 
3.0%
Other values (47) 241433
43.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 270351
48.5%
Uppercase Letter 229384
41.1%
Space Separator 44977
 
8.1%
Other Punctuation 11169
 
2.0%
Decimal Number 726
 
0.1%
Dash Punctuation 601
 
0.1%
Open Punctuation 114
 
< 0.1%
Close Punctuation 114
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 44695
19.5%
E 24666
10.8%
T 15986
 
7.0%
I 15047
 
6.6%
V 14816
 
6.5%
N 13718
 
6.0%
A 12213
 
5.3%
U 12053
 
5.3%
W 11770
 
5.1%
O 9650
 
4.2%
Other values (14) 54770
23.9%
Lowercase Letter
ValueCountFrequency (%)
t 41751
15.4%
i 34281
12.7%
a 28042
10.4%
e 27840
10.3%
n 27549
10.2%
o 25366
9.4%
l 16836
6.2%
d 11436
 
4.2%
r 9863
 
3.6%
c 9621
 
3.6%
Other values (13) 37766
14.0%
Decimal Number
ValueCountFrequency (%)
4 623
85.8%
6 58
 
8.0%
2 42
 
5.8%
3 2
 
0.3%
5 1
 
0.1%
Space Separator
ValueCountFrequency (%)
44977
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 11169
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 601
100.0%
Open Punctuation
ValueCountFrequency (%)
( 114
100.0%
Close Punctuation
ValueCountFrequency (%)
) 114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 499735
89.6%
Common 57701
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 44695
 
8.9%
t 41751
 
8.4%
i 34281
 
6.9%
a 28042
 
5.6%
e 27840
 
5.6%
n 27549
 
5.5%
o 25366
 
5.1%
E 24666
 
4.9%
l 16836
 
3.4%
T 15986
 
3.2%
Other values (37) 212723
42.6%
Common
ValueCountFrequency (%)
44977
77.9%
/ 11169
 
19.4%
4 623
 
1.1%
- 601
 
1.0%
( 114
 
0.2%
) 114
 
0.2%
6 58
 
0.1%
2 42
 
0.1%
3 2
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 557436
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
44977
 
8.1%
S 44695
 
8.0%
t 41751
 
7.5%
i 34281
 
6.1%
a 28042
 
5.0%
e 27840
 
5.0%
n 27549
 
4.9%
o 25366
 
4.6%
E 24666
 
4.4%
l 16836
 
3.0%
Other values (47) 241433
43.3%

VEHICLE TYPE CODE 5
Text

MISSING 

Distinct67
Distinct (%)0.8%
Missing2009835
Missing (%)99.6%
Memory size15.4 MiB
2023-10-01T21:01:56.477177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.23008323
Min length2

Characters and Unicode

Total characters153315
Distinct characters54
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)0.3%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
vehicle 3859
18.5%
station 3165
15.2%
utility 3165
15.2%
sedan 2989
14.3%
wagon/sport 2363
11.3%
passenger 1487
 
7.1%
804
 
3.9%
wagon 804
 
3.9%
sport 802
 
3.8%
truck 233
 
1.1%
Other values (63) 1164
 
5.6%
2023-10-01T21:01:57.608305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12435
 
8.1%
S 12204
 
8.0%
t 11882
 
7.8%
i 9751
 
6.4%
a 7881
 
5.1%
e 7836
 
5.1%
n 7768
 
5.1%
o 7235
 
4.7%
E 6126
 
4.0%
l 4789
 
3.1%
Other values (44) 65408
42.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 76600
50.0%
Uppercase Letter 60724
39.6%
Space Separator 12435
 
8.1%
Other Punctuation 3167
 
2.1%
Dash Punctuation 182
 
0.1%
Decimal Number 161
 
0.1%
Close Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 12204
20.1%
E 6126
10.1%
T 4486
 
7.4%
I 4008
 
6.6%
V 3967
 
6.5%
N 3429
 
5.6%
U 3336
 
5.5%
W 3264
 
5.4%
A 3211
 
5.3%
O 2625
 
4.3%
Other values (13) 14068
23.2%
Lowercase Letter
ValueCountFrequency (%)
t 11882
15.5%
i 9751
12.7%
a 7881
10.3%
e 7836
10.2%
n 7768
10.1%
o 7235
9.4%
l 4789
6.3%
d 3134
 
4.1%
r 2797
 
3.7%
c 2783
 
3.6%
Other values (12) 10744
14.0%
Decimal Number
ValueCountFrequency (%)
4 133
82.6%
2 14
 
8.7%
6 13
 
8.1%
3 1
 
0.6%
Space Separator
ValueCountFrequency (%)
12435
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3167
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 182
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 137324
89.6%
Common 15991
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 12204
 
8.9%
t 11882
 
8.7%
i 9751
 
7.1%
a 7881
 
5.7%
e 7836
 
5.7%
n 7768
 
5.7%
o 7235
 
5.3%
E 6126
 
4.5%
l 4789
 
3.5%
T 4486
 
3.3%
Other values (35) 57366
41.8%
Common
ValueCountFrequency (%)
12435
77.8%
/ 3167
 
19.8%
- 182
 
1.1%
4 133
 
0.8%
) 23
 
0.1%
( 23
 
0.1%
2 14
 
0.1%
6 13
 
0.1%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 153315
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12435
 
8.1%
S 12204
 
8.0%
t 11882
 
7.8%
i 9751
 
6.4%
a 7881
 
5.1%
e 7836
 
5.1%
n 7768
 
5.1%
o 7235
 
4.7%
E 6126
 
4.0%
l 4789
 
3.1%
Other values (44) 65408
42.7%

VEHICLE COMBINATION
Text

MISSING 

Distinct6896
Distinct (%)0.4%
Missing377001
Missing (%)18.7%
Memory size15.4 MiB
2023-10-01T21:01:58.138596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length76
Median length62
Mean length36.19103863
Min length7

Characters and Unicode

Total characters59398325
Distinct characters75
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4090 ?
Unique (%)0.2%

Sample

1st rowSedan & Sedan
2nd rowSedan & Pick-up Truck
3rd rowDump & Sedan
4th rowSedan & Tractor Truck Diesel
5th rowSedan & Sedan
ValueCountFrequency (%)
1948098
20.8%
vehicle 1363891
14.6%
utility 953605
10.2%
station 953548
10.2%
sedan 867714
9.3%
passenger 696733
 
7.5%
wagon/sport 649058
 
6.9%
wagon 304589
 
3.3%
sport 304490
 
3.3%
truck 152985
 
1.6%
Other values (1286) 1153821
12.3%
2023-10-01T21:01:59.359470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7732432
 
13.0%
S 4210722
 
7.1%
t 3299762
 
5.6%
E 3086019
 
5.2%
i 2808213
 
4.7%
e 2317561
 
3.9%
a 2315807
 
3.9%
n 2200992
 
3.7%
o 2075497
 
3.5%
T 1939228
 
3.3%
Other values (65) 27412092
46.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 26125391
44.0%
Lowercase Letter 22629558
38.1%
Space Separator 7732432
 
13.0%
Other Punctuation 2597271
 
4.4%
Decimal Number 119348
 
0.2%
Dash Punctuation 90358
 
0.2%
Open Punctuation 51983
 
0.1%
Close Punctuation 51978
 
0.1%
Modifier Symbol 4
 
< 0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 4210722
16.1%
E 3086019
11.8%
T 1939228
 
7.4%
I 1799254
 
6.9%
N 1644443
 
6.3%
V 1496260
 
5.7%
A 1480190
 
5.7%
R 1234828
 
4.7%
O 1175045
 
4.5%
L 1146833
 
4.4%
Other values (16) 6912569
26.5%
Lowercase Letter
ValueCountFrequency (%)
t 3299762
14.6%
i 2808213
12.4%
e 2317561
10.2%
a 2315807
10.2%
n 2200992
9.7%
o 2075497
9.2%
l 1356191
 
6.0%
d 943862
 
4.2%
r 940138
 
4.2%
c 896528
 
4.0%
Other values (15) 3475007
15.4%
Decimal Number
ValueCountFrequency (%)
4 87575
73.4%
6 26854
 
22.5%
2 4087
 
3.4%
3 550
 
0.5%
1 86
 
0.1%
0 79
 
0.1%
5 62
 
0.1%
9 27
 
< 0.1%
8 16
 
< 0.1%
7 12
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
& 1641245
63.2%
/ 955991
36.8%
. 18
 
< 0.1%
# 7
 
< 0.1%
, 4
 
< 0.1%
' 3
 
< 0.1%
? 3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
7732432
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 90358
100.0%
Open Punctuation
ValueCountFrequency (%)
( 51983
100.0%
Close Punctuation
ValueCountFrequency (%)
) 51978
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 4
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 48754949
82.1%
Common 10643376
 
17.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 4210722
 
8.6%
t 3299762
 
6.8%
E 3086019
 
6.3%
i 2808213
 
5.8%
e 2317561
 
4.8%
a 2315807
 
4.7%
n 2200992
 
4.5%
o 2075497
 
4.3%
T 1939228
 
4.0%
I 1799254
 
3.7%
Other values (41) 22701894
46.6%
Common
ValueCountFrequency (%)
7732432
72.7%
& 1641245
 
15.4%
/ 955991
 
9.0%
- 90358
 
0.8%
4 87575
 
0.8%
( 51983
 
0.5%
) 51978
 
0.5%
6 26854
 
0.3%
2 4087
 
< 0.1%
3 550
 
< 0.1%
Other values (14) 323
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 59398324
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7732432
 
13.0%
S 4210722
 
7.1%
t 3299762
 
5.6%
E 3086019
 
5.2%
i 2808213
 
4.7%
e 2317561
 
3.9%
a 2315807
 
3.9%
n 2200992
 
3.7%
o 2075497
 
3.5%
T 1939228
 
3.3%
Other values (64) 27412091
46.1%
Specials
ValueCountFrequency (%)
� 1
100.0%